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AMENDMENTS TO THE CLAIMS 

This listing of claims will replace all prior versions, and listings, of claims in the application: 
Listing of Claims : 

1 . (Previously presented) A method of determining cluster attractors for a plurality of 
documents, each document comprising at least one term, each term comprising one or more 
words, the method comprising: calculating, in respect of each term, a probability distribution 
indicative of the frequency of occurrence of the, or each, other term that co-occurs with said term 
in at least one of said documents; calculating, in respect of each term, the entropy of the 
respective probability distribution; selecting at least one of said probability distributions as a 
cluster attractor depending on the respective entropy value. 

2. (Original) A method as claimed in Claim 1, wherein each probability distribution comprises, 
in respect of each co-occurring term, an indicator that is indicative of the total number of 
instances of the respective co-occurring term in all of the documents in which the respective co- 
occurring term co-occurs with the term in respect of which the probability distribution is 
calculated. 

3. (Previously presented) A method as claimed in Claim 1, wherein each probability distribution 
comprises, in respect of each co-occurring term, an indicator comprising a conditional probability 
of the occurrence of the respective co-occurring term in a document given the appearance in said 
document of the term in respect of which the probability distribution is calculated. 

4. (Previously presented) A method as claimed in Claim 2, wherein each indicator is normalized 
with respect to the total number of terms in the, or each, document in which the term in respect 
of which the probability distribution is calculated appears. 
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5. (Original) A method as claimed in Claim 1, comprising assigning each term to one of a 
plurality of subsets of terms depending on the frequency of occurrence of the term; and selecting, 
as a cluster attractor, the respective probability distribution of one or more terms from each 
subset of terms. 

6. (Original) A method as claimed in Claim 5, wherein each term is assigned to a subset 
depending on the number documents of the corpus in which the respective term appears. 

7. (Previously presented) A method as claimed in Claim 5, wherein an entropy threshold is 
assigned to each subset, the method comprising selecting, as a cluster attractor, the respective 
probability distribution of one or more terms from each subset having an entropy that satisfies the 
respective entropy threshold. 

8. (Original) A method as claimed in Claim 7, comprising selecting, as a cluster attractor, the 
respective probability distribution of one or more terms from each subset having an entropy that 
is less than or equal to the respective entropy threshold. 

9. (Previously presented) A method as claimed in Claim 5, wherein each subset is associated with 
a frequency range and wherein the frequency ranges for respective subsets are disjoint. 

10. (Previously presented) A method as claimed in Claim 5, wherein each subset is associated 
with a frequency range, the size of each successive frequency range being equal to a constant 
multiplied by the size of the preceding frequency range in order of increasing frequency. 

1 1 . (Previously presented) A method as claimed in Claim 7, wherein the respective entropy 
threshold increases for successive subsets in order of increasing frequency. 

12. (Original) A method as claimed in Claim 11, wherein the respective entropy threshold for 
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successive subsets increases linearly. 

13. (Cancelled) 

14. (Previously presented) An apparatus for determining cluster attractors for a plurality of 
documents, each document comprising at least one term, each term comprising one or more 
words, the apparatus comprising: means for calculating, in respect of each term, a probability 
distribution indicative of the frequency of occurrence of the, or each, other term that co-occurs 
with said term in at least one of said documents; means for calculating, in respect of each term, 
the entropy of the respective probability distribution; and means for selecting at least one of said 
probability distributions as a cluster attractor depending on the respective entropy value. 

15. (Currently amended) A method of clustering a plurality of documents, each document 
comprising at least one term, each term comprising one or more words, the method comprising 
determining cluster attractors in accordance with the method of Claim 1 ; comparing each 
document with each clu ster attractor; and assigning each document to one or more cluster 
attractors depending on t he similarity between the document and the cluster attractors . 

16. (Original) A method as claimed in Claim 15, comprising: calculating, in respect of each 
document, a probability distribution indicative of the frequency of occurrence of each term in the 
document; comparing the respective probability distribution of each document with each 
probability distribution selected as a cluster attractor; and assigning each document to at least one 
cluster depending on the similarity between the compared probability distributions. 

17. (Previously presented) A method as claimed in Claim 16, comprising organizing the 
documents within each cluster by: assigning a respective weight to each document, the value of 
the weight depending on the similarity between the probability distribution of the document and 
the probability distribution of the cluster attractor; comparing the respective probability 
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distribution of each document in the cluster with the probability distribution of each other 
document in the cluster; assigning a respective weight to each pair of compared documents, the 
value of the weight depending on the similarity between the compared respective probability 
distributions of each document of the pair; calculating a minimum spanning tree for the cluster 
based on the respective calculated weights. 

18. (Previously presented) A computer-implemented method of clustering a plurality of 
documents, each document comprising at least one term, each term comprising one or more 
words, the method including: causing a computer to calculate, in respect of each term, a 
probability distribution indicative of the frequency of occurrence of the, or each, other term that 
co-occurs with said term in at least one of said documents; causing the computer to calculate, in 
respect of each term, the entropy of the respective probability distribution; causing the computer 
to select at least one of said probability distributions as a cluster attractor depending on the 
respective entropy value. 
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